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Description 

FIELD OF THE INVENTION 

5 This invention pertains generally to the f i^d of storage and retrieval of data within a database and in particular to a 
method far retrieving multimedia data contained within a datat)ase. 

DESCRIPTION OF THE ART AND PROBLEM 

70 Multimedia databases are databases that contain digital data of Images, music, and video, and are finding ever- 
increasing uses in business and entertainment. The widespread use of multimedia databases, however, is intensifying 
the need to develc^ more effective ways of searching such databases. 

By way of background, a database is a collection of related data. Oftentimes, the data in a database is structurally 
organized as taWes, fields and records. Generally, each record in the database possesses a set of attributes and a user 

15 may wish to conduct a search of the database based upon a value of one or more of the attrilxjtes. For example, in a 
database of football statistics, one may desire to search for all quarterbacks who completed 100 or more passes in a 
single season, etc. 

In conducting such a search, it is assumed that each record possesses several attributes and the user wants to 
search for records that possess certain values of certain attributes. The specification of which records are desired is 

20 called a query and Is usually restricted to one of three types: a) A simple query which specifies a specific value for a 
specific attribute: e.g.. THROWINGARM = LEFT; or THROWINGARM = RIGHT; b) a range wrhich specifies a specific 
range of values for a specific attribute; e.g..WEIGHT < 220 or 180 < WEIGHT < 220; and c) a BOOLEAN such as 
(THROWINGARM = LEFT) AND ( HEIGHT> S*) AND (COMPLETIONS > 99)). 

Unfortmately such query-based methods do not solve the problem of searching a multimedia database. This is 

25 because (1) multimedia data is generally not annotated with textual attribute descriptions, and (2) the vocabulary 
needed to desaibe a user's query may not exist or at least not be known to the user. Consequently, the prior-art has 
Shown an increasing interest in developing content-based multimedia retrieval, particularly for image databases. See. 
for exanple. RM. Kelly arxJ TM. Carwn. "Candid: Conparison Algorithm for Navigating Digital Image Databases", in 
Proceedings Seventh International Wortdng Conference on Scientific and Statistical Database Management, pp. 252- 

30 258. 1994. and J.Barros. J.French, W.Martin, P.Kelly and J.M.White, "Indexing Multispectral Images for Content-Based 
Retrieval", in Proceedings of the 23rd AlPR Workshop on Image and Information Systems. Washington DC. Oct.1994. 

Most prior-art. content-based image retrieval systems use some form of query by visual example to facifitate 
retrieval of an image from the data base. With such a system, successive queries are drawn from a set of images 
retrieved by a previous search. This is a crude fam of relevance feedback which unfortunately, exhibits no adaptation. 

35 Each retrieval of an image is in response only to a specific query and any or all previous retrieval history is ignored. In 
these prbr-art systems, such as those shown by Flickner, et al. in an article entitled "Query by Image and Video Con- 
tent: The QBIC System" which appeared in IEEE Computer. 28(9). pp 23-32, 1995 and an article by K. Hirata et al.. 
entitled "Query By Visual Example. Content Based Image Retrieval", which appeared in Advances in Database Tech- 
nology- EDBT '92, Springer- Verlag. Bertin, 1992.. retrieval ablity is determined entirely by a similarity metric employed. 

40 Typically, such similarity metrics are a weighted sum over a set of features, the weights often being determined manu- 
ally by the user 

Prior-art. image retrieval methods and systems, and in particular, content-teased image retrieval methods and sys- 
tems, can exhibit high false positive false and negative retrieval rates that can lead to long delays in finding a desired 
irrage. 

45 

SUMMARY OF THE INVENTION 

The above problem is solved and an advance Is made over the prior art in accordance with the principles of the 
invention wherein a system and method. PicHunter. based on a general Bayesian framework utilizes relevance feed- 
so l>ack to direct a search. The system maintains a posterior probability distribution that each image in a multimedia data- 
base is the target of the search. This distribution is used to select the next images to display to the user and solicits 
selections from the user pertaining to in^ges displayed. The user's selections are then incorporated into the posterior 
distrilxjtion via a probatelistic user model. A methodology for building a user model from a database of user selecticwis 
is described. A distinguishing characteristic of this framework ts that while other relevance feedback methods r^ine the 
55 query, the method which is the subject of the present invention refines the answer. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The teachings of the present invention can be readily understood by considering the following detailed desaiption 
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in conjunction with the accompanying drawings, in which: 

Figure 1 is a pseudo-code listing for a general Bayesian relevance feedback method; 

Rgure 2 is a screen image of a representative user interface used in accordance with the teachings of the present 
5 invention; 

Rgure 3 is a graph showing frequency of image selection for 3 individuals; 

Rgure 4 is a graph showing frequency of image selection for 3 individuals wh«i no image selections are ignored; 
Rgure 5 is a pseudo-code listing for a user model; 

Rgure 6 is a graph showing the percentage of successful searches as a function of search laigth; 
10 Rgures 7a-7d show screai images of four iterations of a single search; 
Rgure 8 is a pseudo-code listing for an initialization module; 
Rgure 9 is a pseudo-code listing for a computatiai module; and 

Rgure 10 is a flow diagram showing the steps performed when selecting an item for retrieval from a database 
according to the teachings of the present invention. 

15 

DETAILED DESCRIPTION 

A preferred embodiment of the invention will now be described while referring to the figures, several of which may 
be simultaneously referred to during the course of the following description. 
20 Before thorougNy describing the content-based search method and apparatus which is the subject of the present 
invention, it is necessary to define several ddinitions. Specifically, at least three classes of search may be identified: 

Target search A Target search is one in which a user tries to find a specific target image. For example, a 

art-history student might need to find a specific painting, or a graphic artist might look for 

25 a specific stock photo that he has seen before. 

Category search A Category search is one in which a user seeks images from a general category such as: 

"a sunset", "a bird", "a specific kind of car", or "a picture of the Eiffel Tower". The category 
search is distinct from target search because it (the category search) places more empha- 
sis on the semantic content of images and often requires subjective judgments. 

30 Open-ended browsing Open-ended browsing is characterized by a user who has at best a vague idea of what 

he^s looking for. Additionally, the user may repeatedly change his mind during the course 
of the search. As an exanple. an interior decorator might begin with one decorating 
scheme in mind, but end up with a different scheme after seeing images from a design 
database. The consumnr^ate user of an open-ended browsing system may have no prob- 

35 lem In mind. He simply enjoys navigating through a database and reacting to what he 

encounters. 

While each of the above three search classes is important to content-based search methods, it is nevertheless dif- 
ficult to quantitatively define vkrhat correct behavior means for category searching and open-ended brov^rsing. Therefore. 

40 it is necessary to focus on target searching, for which there are several clear metrics for search effectiveness. Two 
exemplary metrics are: 1) the total time required to find a target image and alternatively. 2) the total number of images 
examined until the target is found. 

With these metrics in mind, one test of the effectiveness of an image-retrieval method and system is the average 
difficulty (over many trials) a user experiences in finding a single, randomly selected target image from within a large 

45 database. Those skilled in the art can readily appreciate that several variations of this test are possible. For example, a 
target image can be displayed on a computer monitor beside an image-retrieval interface for the duration of a search. 
Or the target image may be displayed for only a short time interval before the search begins and then the user must 
search for the remembered image. AHernatively. the target image may be distorted. e.g. a hardcopy with color distor- 
tion, from the true image present in the database. This latter procedure may be useful to determine which distor- 

50 tions/features are important for image retrieval. The difficulty the user experiences in finding a target image may be 
quantified in a variety of ways. i.e. the amount of time or number of user interactions required (i.e. mouse dicks, itera- 
tions of the search, keystrokes, etc.). 

Now consider a case where a user is searching for a specific datum in a database by means of a series of dis- 
play/action iterationa Specif k:ally. let 

55 

/ denote the database. In the case of PicHunter. / is a set of images, but the basic framework developed here 

holds just as well for other types of data. 
/, E / be the datum bang searched for. 
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U denote the user that is searching for the target If, 

D/i be the inforniation displayed by the syst^ during iteration k. In the case of PicHunter. this is a sniall set of 

images, however the franiework still applies if additional information is displayed. 
A denote the set of possijie actions that the user can take in response to each display. This is defined by the 

5 user interface. It is assumed that A includes some actions that terminate the search, 

a/t G A be the action taken in iteration k. 

With these assumptions in mind, a question emerges, namely: "given a history of display/action pairs, what is the 
probability that a givei datum. // e /. is equal to the target To find this probability, Bayes* rule is applied, exfxessed 
JO in terms of the probability that the user wouW take the given sequence of actions if the target were />: 

P{/,|ai. . .a^. Di. . .D^, U] oc P{I,\D,, . .D^. U]P{a,, . .aj/,. D,. . .D^. U] (1) 

It can be assumed that the probability of any given datum being the target is independent of who the user Is or what 
75 the sequence of displays has been. Therefore, . .0^, U] = P{/,} is sinply the prior probability that /, was 

chosen as the target 

Next, assume that the user's action in each iteration depends only on the target and on the current display, i.e. it is 
independent of any of the previous iterations. 
Making the above two assumptions, results in: 

20 

P{l,\a,. . .a^. D,. . .D,, U] oc P{/,} YlPla^]!,. D^. U] (2) 

Ac 

25 

Rom Equation (2) it folkws that, after each iteration, the change in probability of a given datum being the target can 
be computed, if there exists a function for estimating the probability of each posstole user action, and in particular: 

P{a,|/„D,. a} = /CS(a,. D^. (3) 

30 

where K 'rsan arbitrary constant Such a function represents a model of user behavior that can be tested and tuned 
experimentally, outside the context of an information retrieval system. 

With reference to Figure 10. there is shown a flowchart which depicts the processing during a database 
search/selection. Specifically, processing begins with desired item 1000. Normally the user of a database retrieval sys- 

35 tern will have a desired item in mind and can then search the database for items matching that desired item. Alterna- 
tively, and as taught by the present invention, a "target" desired item may be displayed to the user and the user then 
selectively finds an object which matches that target 

Processing proceeds at bkx:k 1010 by generating a list of items for display according to a set of rules, i.e.. a prob- 
ability distribution. In that instance wrhere the list is generated according to a probability distribution, items are added to 

40 the list according to their probability of being closely associated with the desired item. 

After the list of items is generated, the items are displayed to the user in block 1020. Subsequently, the user 
responds at block 1030 to the items displayed, by klentifying which one(s) or none of the items are desired. If the 
desired item is displayed 1040, then the item is found and the processing ends 1050. Alternatively, block 1060 recom- 
putes the probability distribution as a result of the user response and processing returns to block 101 0. 

45 Figure 1 is a psuedocode listing for a relevance-feedback method built around a given user-behavior function, S. 
The metiiod shown in Figure 1 maintains the current distribution of probabilities tiiat the data in the database are tine 
target of the search. In each iteration of the search, the N most probable data are shown to a user, and the user's 
response is then used to update the probability distribution. Initially, the probability distribution is initialized with a flat 
prbr distribution (constant value). Nevertheless, those skilled in the art can readily appreciate ttiat tiie probability distri- 

50 bution might just as easily be initialized with a distritxition computed from an explicit query, as when used in a query- 
based system. 

The psuedocode listing shown in Rgure 1 is quite general. To instantiate the pseudocode in an actual program 
requires only that a set of possible actions, A. be defined by designing a user-interface and that a reasonable user 
model. S, be implemented. 

55 With nujltimedia image retrieval systems such as that which is the sut^ed of the present invention, it is desirable to 
keep the user interface as sinrple as possible, emphasizing the. intelligent use of tiie information provided by the user. 
This approach contrasts sharply with approaches such as those advocated by M.Rickner, H. Sawhney. W. Niblack. J, 
Ashley, Q. Huang. B, Dom. M. Gorkani, J, Hafner. D. Lee. D. Petkovic. D. Steele, and P Yanker in an article entitled, 
"Query by Image and Video Content: The QBIC System", which appeared in IEEE Ck)mputer, 28{9):23-32. 1995, in 
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which the authors attenpted to design a maximally informative user interface. 

In contra^ the aitire user interface of PicHunter is shown in Figure 2. At any given time during a search, four 
images are displayed on the screen and presented to a user The user can select one or more of the displayed images 
by clicking on them with the mouse Selected images are highlighted by red borders. Images may be deselected by 
5 clicking on them once again. 

After selecting zero or more images, the user calls up the next set of four images by hitting the "GO button. TTie 
user continues selecting images and hitting "GO- until the target appears. At this point the search is terminated by 
selecting that image and hitting the "FOUND" button. The -ABORT button ta-minates the search before the target 
image is found. TTie set. A. of possible user responses to the images displayed contains 20 elements. Sixteen (16) pos- 
10 siWe combinations of selected images, plus 4 poss&le terminations of the search. 

Those skilled in the art can appreciate that the user interface may be enhanced with the addition of richer controls 
such as buttons that the user can press to independently indicate whether an image resembles the targets color, 
shape, texture, or conposition. For exanple. the system could be implemented with a touch-screen and installed in a 
fabric store to he^ customers find fabric designs. Regardless of the particular user interface enployed. the Bayesian 
15 framework enployed here is applicable to any particular user interface. 

To develop the user model, two simplifying assumptions are made. First it is assumed that aD users are identical, 
so a can be ignored. If not then the approach could be applied to individual users or classes of similar users. Second, 
it is assumed that the users behavior vAW be correlated wnth a small set of image feature values. 

The set of features used in the present invention is finite, nevertiieless, those skilled in the art can readily envision 
20 additional features at some future time. Table 1 describes the 1 8 features presently used. Note that Item #1 6, Contrast, 
has YO as a brightness 



1 


Image width as a firaction of the maximum width in the database 


2 


Image height as a fraction of the maximum height in the database 


3-13 


Percentages of pixels that fall into the following ranges of HSV colorspace. 

"black" -V <3% 

"grey" -S< 15%, 2%<V < 85% 

"white" - S< 15%.80%< V 

"red" - 10% < 5,5% < V, -70deg <H< 25dcg 

"orange" - 10% < 5,2% < V, 15deg <H < SOdeg 

"yellow" - 10% < 5, 8% < V, 25 deg < // < 80 deg 

"green" - 10% < 5,2% < K, 75 deg <H< 185 deg 

"blue" - 2% < 5,2% < V, 175 deg <H < 260 deg 

"purple" - 10% < 5,2% < V, 255 deg <H< 300 deg 

"brown" - 5% < 5 < 85%, 1% < V < 40%, -50 deg<H < 80 deg 

"pink" - 10% < 5 < 60%, 2% < V, -70 deg <H< 25 deg 


14 


Mean saturation 


15 


Median intensity 


16 


Contrast 


17 


Number of "edgels" .in the image. 


18 


The same as feature 17, but thresholded at 10%. 



Table 1: Description of feature set. 



value that one-third of the pixels are below and Y1 as a brightness value that two-thirds of the pixels are below. Addi- 
tionally. Item #1 7, the number of edgels in the image, is computed by first converting the image to grey-scale then f irter- 
55 ing it with a simple 3x3 Laplacian filter and then thresholding tfie result at 20%. 

After confuting the 18-element feature vectors for each image in the database, an infonral experiment was per- 
formed to identify the relationship between features and user actions. In each iteration of the experiment the subject 
(user) was shown a display of four randomly-selected images alongside a single, randomly-selected "target" image. 
The subject was then asked to select the image from the display that was "closest overall" to the target image. If none 
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10 



15 



seemed close at all, then the subject was allowed to leave all of the images unselected. If two or more images seemed 
equally dose, the subject was allowed to select more than one of the images. Surprisingly, a very simple model conre- 
sponcte with the results of this and subsequent experiments. Thin simple model is based on a number of highly simpO- 
fying assunptions: 

1. That the probability of the user selecting a given image depencte only on the feature values of the targ^ image 
and of the other images currently displayed. It is indepaxlent of any other images selected 

2. That the user bases his decision on only one feature for each image, and that the chosen feature will often cor- 
respond to exactly one of the computed features. 

3. That the probability of a user choosing a given feature is constant. The probabifity of each feature is denoted as 
Wf, and is referred to as the feature's weight. 

4. That the probability of the user selecting an image is a linear function of the nuntf)er of other displayed images 
that are farther from the target in the chosen feature. 

Once these assumptions are made, the probability of a given image being selected is a linear function of; 



20 



feF ieo 



1 ifi/(/o-/(A)i<i/(/i)-m)i 
.5 ifi/(/.)-/(A)i = i/(/t)-m)i 
0 ifi/(/o-/(A)i>i/(/o-m)i 



(4) 



where F is a set of real-valued functions corresponding to the conputed features of the images. For simplicity. l/(D,. 

25 D) is called the "image score" of image D,. 

Figure 3 shows the relative frequency with virhlch images were selected by three different subjects as a function of 
their image scores computed in this manner using the 18 features desaibed in Table 1 , and IV^ = 1 . For conrparison. 
the dotted line shows the performance of a purely random "user". Figure 3 clearly illustrates correlation between image 
scores and frequency of selection. 

30 The primary difference between the users is the frequency with which they selected no images at ail. If all the trials 
in which no images were selected are ignored, then the performance of the three subjects is strikingly similar, as illus- 
trated in Figure 3. This suggests that subjects first decided whether or not any image should be selected at all. and that 
different sutajects had different thresholds fa this judgement. But. once the decision to select an image was made, the 
rest of the procedure is accurately nxxJeled by Equation 4. 

35 Using "image scores" and conceding that the model does not apply when no images are selected, the S function 
is computed according to the procedure whose pseudocode is shown in Figure 5. When the user selects no images, 
this procedure returns a constant value, independent of its inputs, which will leave the probability distritxition of the 
database unchanged. When the user selects one or more images, the procedure finds the probability of the user select- 
ing each image according to its image score and returns the product of the probabilities of all selected images times the 

40 product of one minus the prot)abi[ities of all uns^ected images. 

According to the simplifying assumptions, the probability of an image being selected should be a linear function of 
its score. However, such a function may place too much trust in the model. If a linear function of the image score were 
used, then extrenrte image scores would lead to prokjabilities that would have to be dipped at 0 or 1 . Unfortunately, how- 
ever, users make mistakes. Therefore, extreme probabilities could cause the correct target to t>e completely eliminated 

45 from the search. To avoid this a signwid is fit to the experimental data, which is nearly linear in the range of the data, 
but which never quite reaches extreme probabilities. The parameters of the sigmoid. M and a, were set by hand to yield 
a reasonatsle fit to the data. 

The only remaining task is to choose values for the 18 feature weights, Wf, such that they fit the experimental data 
reasonat)ly well. To gather more data for this step, a version of PicHunter was implemented in which all waghts were 
50 set to unity. A number of searches was then performed, and each user's actions were recorded. The measured perform- 
ance of this version of PicHunter was only slightly worse than the performance of the optimized version. 

After verifying that users' behavior during these searches matched the behavior shown in Figure 4. this data was 
combined with the data collected initially 

Next, the correlation coefficients between the recorded selections and image scores computed with each of ourl 8 
55 features alone was determined. The features were then sorted in deaeasing order of correlation and the weight for the 
feature with the highest correlation was set to 1 , while the weights for the other features were initialized to 0. A weight 
for each successive feature, in decreasing order of correlation, was identified by trying 100 weights between the values 
of 0 and 2. and choosing the one which gave the overall best correlation with our data, as shown in Table 2. 



6 



BMSOOCID: <EP„08024e9A2 l.> 



EP0 802 489 A2 



rr I 


0 0223 


WW 2 


0.1362 
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0 0201 
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0 0647 


yy 12 


0 0335 




0 0112 


rr 14 


0.0893 




0.0826 


Wis 


0.0491 


Wn 


0.0134 


Wis 


0.0536 



25 

Table 2: Feature weights 



To test PicHunter. a cx)mmerciai database of over 20.000 slock images on 201 Compact Disk's (CD's) was 
acquired. Four thousand, five hundred, twenty-two (4522) of the inrages were transferred to a hard disk at a resolution 
of 128X192 pixels and the 18 features contained in Table 1 were preconiputed. 

35 PicHunter was tested on this database by several subjects who conducted a number of searches. The first subject 
was one of the inventors of the present application and this first set of results represent the performance that can be 
achieved by a user with maxinrwl expertise - that is: a user whose behavior is likely to match the S function. The second 
subject was a systems programmer who was uninvolved in the development of the system. This second set of results 
represent the performance of a computer expert who has no special expertise in the use of the system. Lastly, the sys- 

40 tern was briefly tested by eight other suk3jects. who performed just one or two searches each. This third set of results 
represent the performance of people with nonna! computer literacy and no knowledge of the system whatsoever. 
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Expert 
User 


Computer 
Expert 


Computer 
Literates 


5 


Number of 
Searches 


20 


24 


11 




Number of 


20 


19 


6 




targets found 








10 


Median search 
length 


22 


65 


234 




Mean successful 


53 


57 


75 




search length 








15 


Standard dev. of 
successful searches 


78 


40 


86 



20 Table 3: Test results for various subjects. 



Each of the testers was given only a minimal amount of instructions consisting only of the following: 

25 

1. Select the image that's most similar to the target by clicking on it. A selected image has a red border. 

2. If you can1 decide between two or more images, you can select more than one. 

30 3. If none of the images looks at all close to the target or if all of them look equally close, doni select any of them. 

4. If you change your mind about which image(s) is (are) closest, you can deselect images by clicking on them 
again. 

35 5. Once youVe selected the best Inrwge(s), dick on the -GO" button to get the next set of images. 
6. When the target image appears, select it and then hit the "FOUND" button. 

Of course, these instructions are highly uninformative and alternative instructions would include hints atx)ut how 
40 best to work with the system. The results of all the tests are shown in Table 3. Figure 6 graphs the results for the two 
computer experts, showing the percentage of searches that were completed successfully within a given number of iter- 
ations. The dotted line indicates the corresponding percentages that would be expected if the database was searched 
at random. 

The results from the two computer experts clearly show that the system is a substantial improvement over random 
45 search in spite of the simplicity of the user interface and precomputed image features. On average. Images were suc- 
cessfully found after just over 55 iterations, or 220 images viewed out of 4522. The simplicity of the user interface makes 
it possftjie for an experienced user to sperxJ just one or two seconds on each Iteration, so an average search l^ts only 
a few minutes. 

The results from third set were not as good. Only half of Xh&r searches were successful and the mean length of a 
50 successful search was 75 Iterations. Nevertheless, this is significantly better than random chance. 

One possitrfe reason for the difficulty encountered by the third group of users is that this group oft^ focused on 
features which the system does not measure. For example, one such user, who was searching for a picture of a fla- 
mingo, reported that she was basing her selections on whether images contained curves notching the curve of the fla- 
mingo's neck, inasmuch as the present feature set contains no measure of curvature whatsoever, it is not surprising 
55 then that the image was not found. A more corrprehensive feature set will likely improve the performance for such nov- 
ice users. 

As stated previously, the current user interface was chosen to be as simple as possible. Specifically, the system dis- 
plays a set of images (currently 4) and the user selects a subset of them by clicking on them with with a selection device, 
i.e.. a mouse. While this interface is readily understandable by a user, it provides the search engine with rather impov- 
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erished information about the user's desires. More complex interfaces which allow the user to provide the engine more 
precise information are easily imagined. 

For exanple. a sirrple extension to the current interface would aUow a user to indicate the strength with which a 
displayed image matches his goals. Additionally, mechanisms that allow the user to indicate the features on which a 

5 match occurs would likely prove useful. A more complex enhancement to the user interface would allow the leer to 
selectively identify those portions of a presented image which are particularty relevant. The user coukJ simply dick on 
the relevant image regions or could "lasso" them with the mouse. Such an enhancement would allow the user to be 
quite specific while not requiring him to learn a complex query language or drawing tool. 

It is desirable to choose cfisplays that maximize the expected reduction in entropy which will result from Bayesian 

JO update of the database prd>ability distribution. This is equivalent to maximizing the nurtual information between user 
actions and target images. By way of example, it Tis the target image, drawn from a distrtoution thars been determined 
by the search up to the present moment, and A is the users action, drawn from the distribution determined by the pre- 
diction algorithm, then it is desiral^e to find the D which maximizes: 

15 

-5]P{7) log P{T} + -^P{A\D} log P[A\D} £ ^{7", A\D] log PfT. A|D) 

r A T.A 



20 For a given display, the mutual information may be computed between actions and target images in 0(/W x S^) 
time, where M is the nuntoer of images in the database, and S/v is the complexity of estimating the probabilities of user 
actions on a display of N images. 

It is important to use all possible user actions in the calculation of mutual information, including actions that termi- 
nate the search. K the target image is one of the images in the display, then the user has a 100% chance of taking the 

25 corresponding termination action and a 0% chance of doing anything else. The effect of this is that Bayesian update 
after a termination action reduces the ^ropy to 0 and the mutual infonnation resufting from a given display reflects the 
probability that the display contains the target image. 

Unfortunately, a brute-force approach to finding the display which gives the highest mutual information would be 
0(M X X (/W choose N)) . As such, even when /W and A/ are trivially small, a brute-force approach is prohibitive. 

30 To reduce the 0(M choose W) part, a 'greedy' heuristic is used. First, the best single image to display is chosen. 
This will simply be the one that has the highest probability of being the target. Next, the particular one image that will 
provide the most informative 2-image display is acWed. Note that tiiis step requires 0(/W x /W x Sg) computation, 
since it is necessary to compute mutual information given each of the images in the database. Repeating N times to 
get N images, then, takes 

35 

0((2:r=i Si) X M^) 

amount of time. While this is far better, it is still prohibitive when M is large, as it will be in any database of real interest. 

40 To get into a reasonable complexity range, some of tiie images may be ignored as possible candidates for display. 
At first ^ance. it would seem that the candidates for display should just be the C images with the highest probability of 
being the target, where C is some manageable constant. But the top C images are often quite similar to each other, 
and obtaining useful information requires display of one or two images with lower probabilities for the sake of compari- 
son. Therefore the top images are used, plus images that are spaced evenly through the Training list. Consid- 

45 ering only tiiese images for display gives a complexity of 

0((Ei^i5i)x(Ci + C2)xM). 



This can be further improved by using an approximate n>ethod to estimate the mutual information. The method is 
divided into two stages: an initialization stage, of complexity 0{M log /W). and a computation stage, of complexity 0(K 
X S/v). where K is a value chosen to yield acceptatrfe precision. The initialization stage need be computed only once for 
a given distribution of 7"%, Then /(7; A\D) can be estimated for various possible D*s witiiout redoing the initialization. 

With reference to Rgure 8. the method selects a set of K sample images. The probability that the target image is 
one of the images between two adjacent samples. sample[/l and sample[/' + 11. is 1//C ("between" here means between 
them in the sated list tUst). Note that an image whose probability is higher than 1/K may af^ear more than once on 
the list of samples. 

In the computation stage shown in Figure 9, it is assumed that the list of samples is a complete list of aB the possi- 
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ble images, and each has a probability of 1/K. 

Using this approximation of mutual information, and the heuristics described above, a set of images can be chosen 
for display in 

OiiELSi)^{C,+C^)>cK) 

anx)urrt of time. Since , C2. and K can be chosen to be much smaller than /W. this is a significant improvement 
While the invention has been shown and desaibed in detail in the context of a preferred ent»odiment. it wfll be 
10 apparent to those skilled in the art that variations and mocfif ications are possible without departing from the broad prin- 
ciples and spirit of the invention. Specifically, while the invention has only been described with respect to searching 
image databases, virtually all of the concepts are relevant to other media. In particular, audio, video databases and 3- 
d models. 

15 Claims 

1 . An interactive method for f incfing items in a database, said method comprising the steps of: 

generating a list of N items according to a procedure based upon a probability distribution over the items in the 
20 database; 

displaying said list of N items to a user; 

capturing a user response to the displayed items through an interactive user-interface, wherein said response 
expresses a relation between the displayed items and an item being searched for; 
recomputing the probability distribution over items in the database according to the user response; and 
25 repeating the alx>ve steps until the item being searched for is one of the N items displayed. 

2. The method according to claim 1 wherein said generating step generates a list of N most probable items. 

3. The method according to daim 1 wherein said generating step generates the lists of N items such that any mutual 
30 information between the user response and the item searched for is maximized. 

4. The method according to claim 3 In which the list of N items to be displayed is chosen by: 

(i) placing the highest probability item on the list; 
35 (ii) using a subset of the items in the datat>ase. finding an item wtiich when added to the list results in a com- 

bination having the highest estimated mutual information; 
(iii) adding the item to the list; and 

repeating ttie above steps (ii)— {iii) until the list contains N items. 

40 

5. The method according to claim 3 wherein the metiiod of estimating mutual information comprises the steps of: 

^ initializing, in deaeasing order of probability, a complete list of possiljle targets; 
choosing from tfie list a set of sample targets having equally spaced probabilities; and 
45 computing the nurtual information between the sample targets and the user response where each of the sam- 

ple targets has equal prior probability. 

6. The method according to claim 1 wherein said capturing step further includes tiie user responding to displayed 
items by identifying the items that are most similar to tiie item being searched for. 

50 

7. The method according to claim 1 wherein said capturing step furtiier includes the user responcfing to the displayed 
items by indicating a degree of similarity between the displayed items and the item being searched for. 

8. The method according to claim 1 wherein said capturing step furtiier include the user responding to cfisplayed 
55 items by indicating aspects or conponents of tiie dsplayed items that are similar to the Item being searched for. 

9. The method according to daim 1 wherein said recomputing step includes computing a new probability distribution 
over items in the database from tiie user response according to Bayes* rule and an algorithmic model of the user 
wherein said algorithmic model computes a probability distribution over response, conditioned on items displayed 
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and items being searched for. 

1 0. The method according to claim 9 wherein: the algorithmic model of the user employs a set of precomputed features 
of the items in the database. 

1 1 . The method according to daim 9 wherein: the algorithmic model of the user employs a set of manually ents-ed fea- 
tures of the iterrs in the database. 

12. The method according to daim 9 wherein: the algorithmic model of the user assigns an independent probabifity of 
10 the user identifying each displayed item as dosest to the item being searched for. and computes the probabiGty of 

any given combination of indications as the product of the probabilities of the indicated items times the product of 
one minus the non-indicated items. 

13. The method according to daim 12 wherein: the probability of each displayed item being identified as closest to the 
15 item being searched for is computed as the product of the probabilities that the user will consider that displayed 

item to be closer than the other displayed items. 

1 4. The method accordng to claim 1 0. wherein the database is a database of images and one of the precomputed fea- 
tures is a histogram of basic colors. 



20 



15. The method according to daim 1 wherein the database is a database of images. 

1 6. The method according to claim 1 wherein the database is a database of sounds. 

25 1 7. The method according to claim 1 wherein the database is a database of video dips. 

18. The method according to daim 1 wherein the database is a database of 2d shapes. 

19. The method according to claim 1 wherein the database is a database of 3d shapes. 

30 

20. The method according to claim 1 v^rfnerein the database is a database of text. 

21 . An interactive method for measuring the effectiveness of a database search, said method comprising the steps of: 

35 generating a target object for which to be searched; 

displaying said target object and a set of candidate objects to a user; 

measuring the search effectiveness by evaluating one or more metrics of difficulty user has in finding the tar- 
get; and 

repeating the atwve steps while averaging the measures of difficulty. 

40 



45 



50 
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For each datum, I(i), in the database 

Initialize the probability of I(i), P(i), to l/(size of database) 
Let D = N randomly selected data from I 
Display D 

Get the user's response 'a' 
While 'a' doesn't terminate the search 
For each datum, I(i), in D 

P(i) = 0 // We know that none of the data we just showed the user 
// is the one he's looking for. 
For each datum, I(i) , in the database 

P(i) = P(i) * S( a, I(i), U ) 
For each datum I(i) in the database 

P(i) = P(i) / sum(P) 
Let D = the M data with the highest probabilities 
Display D 

Get the user's response 'a* 



Figure 1 
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Figure 3 
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Image "score' 



Figure 4 
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S is the estimated probability being computed 
D is the set of images 

ai is 1 if the user selected D(i), 0 if not 
I is the (hypothesized) target image 
S = 1 

If no images vere selected 

S = 1 
Else 

For each displayed image, D(i) in D 
V(f) = 0 

For each feature f in F 

For each D(j)in D, D(j) !^ D(i) 

If abs( f(I) - f(D(i)) ) < abs( f(I) - f(D(j)) ) 
Vf = Vf + Wf 

Else if abs( f(I) - f(D(i)) ) == abs( f(I) - f(D(j)) ) 
Vf = Vf + .5 ♦ Wf 
P = 1. / (1 + exp( (M - V(f)) / Sigma )) 
If ai == 1 

S = S * P 
Else 

S = S * (1 - P) 



Figure 5 
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INITIALIZATION: 

Let tListCl] .. .tListCM] be a complete list of possible 

targets, sorted in decreasing order of probability 
p = 0 
i = 0 

For j = 1 to.K 
{ 

nextP = (j - 1) / (K - 1) 

While p + P-C tListCi] } < next? 

{ 

p = p + ?< tListCi] > 
i = i + 1 

} 

sample [j] = tList[i] 

} 



Figure 8 
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Let pADCl] . . .pAD[R] = 0, where R is the number of 

possible actions 
Let AC1]...ACR] be a complete list of possible actions 
Hta = 0 
For j = 1 to K 
{ 

For a = 1 to R 
{ 

p = (1/K) * P-C ACa] I sampleCj], D } 

If p > 0 

{ 

pADCa] = pADCa] + p 
Hta = Hta - p * log p 

> 

> 

} 

Ha = 0 

For a = 1 to R 
{ 

If pADCa] > 0 
{ 

Ha = Ha - pAD[a] * log pAD[a] 

} 

} 

I = (log K) + Ha - Hta 



Figure 9 
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